In [1]:
%load_ext load_style
%load_style talk.css


Read SST NetCDF data, Subsample and Save

This notebook carries out some basic operations:

  • Open a data file
  • Check variables
  • Indexing to subsampe a variable
  • Save data

1. Load basic libraries


In [2]:
%matplotlib inline
import numpy as np  
from netCDF4 import Dataset # http://unidata.github.io/netcdf4-python/

2. Set input NetCDF file info

You can download data using wget or aria2c under Linux. In this case, the data is already downloaded and put into the folder of data.


In [3]:
#!wget ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.derived/surface_gauss/skt.sfc.mon.mean.nc
ncfile = 'data\skt.mon.mean.nc'

3. Extract variables

Open a NetCDF file and print the file handler, then you can find the information of its variales from last several lines.

Just like:

  • platform: Model
  • Conventions: COARDS
  • dimensions(sizes): lon(192), lat(94), time(687)
  • variables(dimensions): float32 lat(lat), float32 lon(lon), float64 time(time), float32 skt(time,lat,lon)

In [4]:
fh     = Dataset(ncfile, mode='r') # file handle, open in read only mode
print(fh)
fh.close() # close the file


<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
    title: 4x daily NMC reanalysis
    history: Tue Jul  6 00:05:45 1999: ncrcat skt.mon.mean.nc /Datasets/ncep.reanalysis.derived/surface_gauss/skt.mon.mean.nc /dm/dmwork/nmc.rean.ingest/combinedMMs/skt.mon.mean.nc
renamevars Fri Dec 18 12:16:41 1998 from airsst.mon.mean.nc
/home/hoop/crdc/cpreanjuke2farm/cpreanjuke2farm Mon Oct 23 21:04:20 1995 from air.sfc.gauss.85.nc
created 95/03/13 by Hoop (netCDF2.3)
    description: Data is from NMC initialized reanalysis
(4x/day).  It consists of T42  variables interpolated to
pressure surfaces from model (sigma) surfaces.
    platform: Model
    Conventions: COARDS
    dimensions(sizes): lon(192), lat(94), time(687)
    variables(dimensions): float32 lat(lat), float32 lon(lon), float64 time(time), float32 skt(time,lat,lon)
    groups: 


In [5]:
fh     = Dataset(ncfile, mode='r') # file handle, open in read only mode
lon    = fh.variables['lon'][:]
lat    = fh.variables['lat'][:]
nctime = fh.variables['time'][:]
t_unit = fh.variables['time'].units
skt    = fh.variables['skt'][:]

try :
    t_cal = fh.variables['time'].calendar
except AttributeError : # Attribute doesn't exist
    t_cal = u"gregorian" # or standard

fh.close() # close the file

4. Access the first and the last value of latitude


In [6]:
lat[0] # Caution! Python’s indexing starts with zero


Out[6]:
88.542

In [7]:
lat[-1] # gives the last value of the vector


Out[7]:
-88.542

5. Select a subregion

  • Lat: -50 ~ -90
  • Lon: 0 ~ 360

In [8]:
lat_so = lat[-21:-1]
lon_so = lon
skt_so = skt[:,-21:-1,:]

6. Save subregion data

save subregion data (several arrays) into a single file in uncompressed .npz format using np.savez.


In [9]:
np.savez('data/skt.so.mon.mean.npz', skt_so=skt_so, lat_so=lat_so, lon_so=lon_so)

Surely, you can load these data back.


In [10]:
npzfile = np.load('data/skt.so.mon.mean.npz')
npzfile.files


Out[10]:
['skt_so', 'lat_so', 'lon_so']

References

http://unidata.github.io/netcdf4-python/

John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55

Stéfan van der Walt, S. Chris Colbert and Gaël Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37

Kalnay et al.,The NCEP/NCAR 40-year reanalysis project, Bull. Amer. Meteor. Soc., 77, 437-470, 1996.


In [ ]: